Automatic Paragraph Identification: A Study across Languages and Domains
نویسندگان
چکیده
In this paper we investigate whether paragraphs can be identified automatically in different languages and domains. We propose a machine learning approach which exploits textual and discourse cues and we assess how well humans perform on this task. Our best models achieve an accuracy that is significantly higher than the best baseline and, for most data sets, comes to within 6% of human performance.
منابع مشابه
The Comparative Effect of Using Idioms in Conversation and Paragraph Writing on EFL Learners’ Idiom Learning
This study investigated the comparative effect of teaching idiomatic expressions through practicing them in conversation and paragraph writing on intermediate EFL learners’ idiom learning. The participants were sorted out of a population of 134 intermediate students in Zabansara Language School in Khorramabad based on their scores on a Preliminary English Test (PET) and an idiom test piloted in...
متن کاملComprehension across Application Domains and Languages
This work demonstrates that our natural language understanding framework can be applied across application domains and languages with ease. Approaches towards language understanding generally involve much handcrafting, e.g. in writing grammars or annotating corpora, hence portability is a desirable trait in the development of language understanding systems. Our framework for natural language un...
متن کاملLanguage Complexity, Accuracy and Fluency in Different Types of Writing Paragraph: Do the Raters Notice Such Effect
The aim of the present study was to investigate the effects of two types of paragraph on EFL learners’ written production. It addressed the issue of how three aspects of language production (i.e. complexity, accuracy, and fluency) vary among two types of paragraphs (i.e. paragraphs of chronology and cause-effect) written by EFL learners. Thirty intermediate level learners of English participate...
متن کاملAcquiring entailment pairs across languages and domains: A Data Analysis
Entailment pairs are sentence pairs of a premise and a hypothesis, where the premise textually entails the hypothesis. Such sentence pairs are important for the development of Textual Entailment systems. In this paper, we take a closer look at a prominent strategy for their automatic acquisition from newspaper corpora, pairing first sentences of articles with their titles. We propose a simple l...
متن کاملKohonen Self Organizing for Automatic Identification of Cartographic Objects
Automatic identification and localization of cartographic objects in aerial and satellite images have gained increasing attention in recent years in digital photogrammetry and remote sensing. Although the automatic extraction of man made objects in essence is still an unresolved issue, the man made objects can be extracted from aerial photos and satellite images. Recently, the high-resolution s...
متن کامل